{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Проект e-learning (вариант 2)    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Для начала подгрузим данные из четырёх имеющихся таблиц и посмотрим на то, как их интерпретировать:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Файл **assessments.csv** - содержит в себе информацию об оценках. Каждый предмет в семестре включает ряд тестов с оценками, за которыми следует экзамен."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>id_assessment</th>\n",
       "      <th>assessment_type</th>\n",
       "      <th>date</th>\n",
       "      <th>weight</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1752</td>\n",
       "      <td>TMA</td>\n",
       "      <td>19.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1753</td>\n",
       "      <td>TMA</td>\n",
       "      <td>54.0</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1754</td>\n",
       "      <td>TMA</td>\n",
       "      <td>117.0</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1755</td>\n",
       "      <td>TMA</td>\n",
       "      <td>166.0</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1756</td>\n",
       "      <td>TMA</td>\n",
       "      <td>215.0</td>\n",
       "      <td>30.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module code_presentation  id_assessment assessment_type   date  weight\n",
       "0         AAA             2013J           1752             TMA   19.0    10.0\n",
       "1         AAA             2013J           1753             TMA   54.0    20.0\n",
       "2         AAA             2013J           1754             TMA  117.0    20.0\n",
       "3         AAA             2013J           1755             TMA  166.0    20.0\n",
       "4         AAA             2013J           1756             TMA  215.0    30.0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "assesments = pd.read_csv('/mnt/HC_Volume_18315164/home-jupyter/jupyter-a-kuksova/First_Project/assessments.csv')\n",
    "assesments.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**code_module** — идентификационный код предмета. Предметы, которые есть в семестрах. Каждый оценивается определенным образом.\n",
    "\n",
    "**code_presentation** — семестр, состоит из года и буквы(Идентификационный код). На этапе рассмотрения могу сделать предположение, что семестр состоит из нескольких предметов.\n",
    "\n",
    "**id_assessment** — тест (Идентификационный номер).\n",
    "\n",
    "**assessment_type** — тип теста. Существуют три типа оценивания: оценка преподавателя (TMA), \n",
    "компьютерная оценка (СМА), экзамен по курсу (Exam).\n",
    "\n",
    "**date** — информация об окончательной дате сдачи теста. Рассчитывается как количество дней с момента начала семестра. Дата начала семестра имеет номер 0 (ноль).\n",
    "\n",
    "**weight** — вес теста в % в оценке за курс. Обычно экзамены рассматриваются отдельно и имеют вес 100%. Сумма всех остальных оценок составляет 100%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(206, 6)\n",
      "code_module           object\n",
      "code_presentation     object\n",
      "id_assessment          int64\n",
      "assessment_type       object\n",
      "date                 float64\n",
      "weight               float64\n",
      "dtype: object\n",
      "code_module  code_presentation  id_assessment  assessment_type  date   weight\n",
      "False        False              False          False            False  False     195\n",
      "                                                                True   False      11\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(assesments.shape) # смотрим размерность датафрейма\n",
    "print(assesments.dtypes) # знакомимся с типами данных\n",
    "print(assesments.isna().value_counts()) # проверяем, где отсутствуют значения / колонка date - 11 отсутствующих значений"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Файл **studentAssessment.csv** — содержит результаты тестов студентов. Если учащийся не отправляет работу на оценку, результат не записывается в таблицу."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_assessment</th>\n",
       "      <th>id_student</th>\n",
       "      <th>date_submitted</th>\n",
       "      <th>is_banked</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1752</td>\n",
       "      <td>11391</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>78.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1752</td>\n",
       "      <td>28400</td>\n",
       "      <td>22</td>\n",
       "      <td>0</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1752</td>\n",
       "      <td>31604</td>\n",
       "      <td>17</td>\n",
       "      <td>0</td>\n",
       "      <td>72.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1752</td>\n",
       "      <td>32885</td>\n",
       "      <td>26</td>\n",
       "      <td>0</td>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1752</td>\n",
       "      <td>38053</td>\n",
       "      <td>19</td>\n",
       "      <td>0</td>\n",
       "      <td>79.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id_assessment  id_student  date_submitted  is_banked  score\n",
       "0           1752       11391              18          0   78.0\n",
       "1           1752       28400              22          0   70.0\n",
       "2           1752       31604              17          0   72.0\n",
       "3           1752       32885              26          0   69.0\n",
       "4           1752       38053              19          0   79.0"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "studAssesment = pd.read_csv('/mnt/HC_Volume_18315164/home-jupyter/jupyter-a-kuksova/First_Project/studentAssessment.csv')\n",
    "studAssesment.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**id_assessment** — тест (Идентификационный номер).\n",
    "\n",
    "**id_student** — идентификационный номер студента.\n",
    "\n",
    "**date_submitted** — дата сдачи теста студентом, измеряемая как количество дней с начала семестра.\n",
    "\n",
    "**is_banked** — факт перезачета теста с прошлого семестра (иногда курсы перезачитывают студентам,\n",
    "вернувшимся из академического отпуска).\n",
    "\n",
    "**score** — оценка учащегося в этом тесте. Диапазон составляет от 0 до 100. \n",
    "\n",
    "**Оценка ниже 40 неуспешная сдача теста!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(173912, 5)\n",
      "id_assessment       int64\n",
      "id_student          int64\n",
      "date_submitted      int64\n",
      "is_banked           int64\n",
      "score             float64\n",
      "dtype: object\n",
      "id_assessment  id_student  date_submitted  is_banked  score\n",
      "False          False       False           False      False    173739\n",
      "                                                      True        173\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(studAssesment.shape) # смотрим размерность датафрейма\n",
    "print(studAssesment.dtypes) # знакомимся с типами данных\n",
    "print(studAssesment.isna().value_counts()) # проверяем, где отсутствуют значения / колонка score - 173 отсутствующих значений"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Файл **studentRegistration.csv** — содержит информацию о времени, когда студент зарегистрировался для прохождения курса в семестре."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>id_student</th>\n",
       "      <th>date_registration</th>\n",
       "      <th>date_unregistration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>11391</td>\n",
       "      <td>-159.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>28400</td>\n",
       "      <td>-53.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>30268</td>\n",
       "      <td>-92.0</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>31604</td>\n",
       "      <td>-52.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>32885</td>\n",
       "      <td>-176.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module code_presentation  id_student  date_registration  \\\n",
       "0         AAA             2013J       11391             -159.0   \n",
       "1         AAA             2013J       28400              -53.0   \n",
       "2         AAA             2013J       30268              -92.0   \n",
       "3         AAA             2013J       31604              -52.0   \n",
       "4         AAA             2013J       32885             -176.0   \n",
       "\n",
       "   date_unregistration  \n",
       "0                  NaN  \n",
       "1                  NaN  \n",
       "2                 12.0  \n",
       "3                  NaN  \n",
       "4                  NaN  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "studReg = pd.read_csv('/mnt/HC_Volume_18315164/home-jupyter/jupyter-a-kuksova/First_Project/studentRegistration.csv')\n",
    "studReg.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**code_module** — предмет (идентификационный код).\n",
    "\n",
    "**code_presentation** — семестр (идентификационный код).\n",
    "\n",
    "**id_student** — идентификационный номер студента.\n",
    "\n",
    "**date_registration** — дата регистрации студента. Это количество дней, измеренное от начала семестра (например, отрицательное значение -30 означает, что студент зарегистрировался на прохождение курса за 30 дней до его начала).\n",
    "\n",
    "**date_unregistration** — дата отмены регистрации студента с предмета. У студентов, окончивших курс, это поле остается пустым."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(32593, 5)\n",
      "code_module             object\n",
      "code_presentation       object\n",
      "id_student               int64\n",
      "date_registration      float64\n",
      "date_unregistration    float64\n",
      "dtype: object\n",
      "code_module  code_presentation  id_student  date_registration  date_unregistration\n",
      "False        False              False       False              True                   22515\n",
      "                                                               False                  10033\n",
      "                                            True               False                     39\n",
      "                                                               True                       6\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(studReg.shape) # смотрим размерность датафрейма\n",
    "print(studReg.dtypes) # знакомимся с типами данных\n",
    "print(studReg.isna().value_counts()) # проверяем, где отсутствуют значения date_registration / date_unregistration — у студентов, окончивших курс, это поле остается пустым."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Файл **courses.csv** — содержит список предметов по семестрам."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>module_presentation_length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2014J</td>\n",
       "      <td>269</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BBB</td>\n",
       "      <td>2013J</td>\n",
       "      <td>268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BBB</td>\n",
       "      <td>2014J</td>\n",
       "      <td>262</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>BBB</td>\n",
       "      <td>2013B</td>\n",
       "      <td>240</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module code_presentation  module_presentation_length\n",
       "0         AAA             2013J                         268\n",
       "1         AAA             2014J                         269\n",
       "2         BBB             2013J                         268\n",
       "3         BBB             2014J                         262\n",
       "4         BBB             2013B                         240"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "courses = pd.read_csv('/mnt/HC_Volume_18315164/home-jupyter/jupyter-a-kuksova/First_Project/courses.csv')\n",
    "courses.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**courses.csv** — файл содержит список предметов по семестрам.\n",
    "\n",
    "**code_module** — предмет (идентификационный код).\n",
    "\n",
    "**code_presentation** — семестр (идентификационный код).\n",
    "\n",
    "**module_presentation_length** — продолжительность семестра в днях."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(22, 3)\n",
      "code_module                   object\n",
      "code_presentation             object\n",
      "module_presentation_length     int64\n",
      "dtype: object\n",
      "code_module  code_presentation  module_presentation_length\n",
      "False        False              False                         22\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(courses.shape) # смотрим размерность датафрейма\n",
    "print(courses.dtypes) # знакомимся с типами данных\n",
    "print(courses.isna().value_counts()) # проверяем, где отсутствуют значения / таблица полная"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Обработка данных"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "По результатам рассмотрения данных можно заметить, что в каждой из таблиц находятся одинаковые колонки. Следовательно, таблицы можно попробовать объединить для дальнейшего удобства работы. Что я и сделаю :)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Буду использовать метод **merge** для табличного объединения, который соединяет строки в Dataframe на основе одного или нескольких ключей."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_data = assesments.merge(courses, on=['code_module', 'code_presentation'])\n",
    "all_data = all_data.merge(studReg, on=['code_module', 'code_presentation'])\n",
    "all_data = all_data.merge(studAssesment, on=['id_assessment', 'id_student'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>id_assessment</th>\n",
       "      <th>assessment_type</th>\n",
       "      <th>date</th>\n",
       "      <th>weight</th>\n",
       "      <th>module_presentation_length</th>\n",
       "      <th>id_student</th>\n",
       "      <th>date_registration</th>\n",
       "      <th>date_unregistration</th>\n",
       "      <th>date_submitted</th>\n",
       "      <th>is_banked</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1752</td>\n",
       "      <td>TMA</td>\n",
       "      <td>19.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>268</td>\n",
       "      <td>11391</td>\n",
       "      <td>-159.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>78.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1752</td>\n",
       "      <td>TMA</td>\n",
       "      <td>19.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>268</td>\n",
       "      <td>28400</td>\n",
       "      <td>-53.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>22</td>\n",
       "      <td>0</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1752</td>\n",
       "      <td>TMA</td>\n",
       "      <td>19.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>268</td>\n",
       "      <td>31604</td>\n",
       "      <td>-52.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>17</td>\n",
       "      <td>0</td>\n",
       "      <td>72.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1752</td>\n",
       "      <td>TMA</td>\n",
       "      <td>19.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>268</td>\n",
       "      <td>32885</td>\n",
       "      <td>-176.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>26</td>\n",
       "      <td>0</td>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AAA</td>\n",
       "      <td>2013J</td>\n",
       "      <td>1752</td>\n",
       "      <td>TMA</td>\n",
       "      <td>19.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>268</td>\n",
       "      <td>38053</td>\n",
       "      <td>-110.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>19</td>\n",
       "      <td>0</td>\n",
       "      <td>79.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173907</th>\n",
       "      <td>GGG</td>\n",
       "      <td>2014J</td>\n",
       "      <td>37437</td>\n",
       "      <td>TMA</td>\n",
       "      <td>173.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>269</td>\n",
       "      <td>2606765</td>\n",
       "      <td>-64.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>185</td>\n",
       "      <td>0</td>\n",
       "      <td>55.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173908</th>\n",
       "      <td>GGG</td>\n",
       "      <td>2014J</td>\n",
       "      <td>37437</td>\n",
       "      <td>TMA</td>\n",
       "      <td>173.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>269</td>\n",
       "      <td>2620947</td>\n",
       "      <td>-23.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>179</td>\n",
       "      <td>0</td>\n",
       "      <td>75.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173909</th>\n",
       "      <td>GGG</td>\n",
       "      <td>2014J</td>\n",
       "      <td>37437</td>\n",
       "      <td>TMA</td>\n",
       "      <td>173.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>269</td>\n",
       "      <td>2645731</td>\n",
       "      <td>-23.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>177</td>\n",
       "      <td>0</td>\n",
       "      <td>72.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173910</th>\n",
       "      <td>GGG</td>\n",
       "      <td>2014J</td>\n",
       "      <td>37437</td>\n",
       "      <td>TMA</td>\n",
       "      <td>173.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>269</td>\n",
       "      <td>2648187</td>\n",
       "      <td>-129.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>172</td>\n",
       "      <td>0</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173911</th>\n",
       "      <td>GGG</td>\n",
       "      <td>2014J</td>\n",
       "      <td>37437</td>\n",
       "      <td>TMA</td>\n",
       "      <td>173.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>269</td>\n",
       "      <td>2684003</td>\n",
       "      <td>-28.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>169</td>\n",
       "      <td>0</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>173912 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       code_module code_presentation  id_assessment assessment_type   date  \\\n",
       "0              AAA             2013J           1752             TMA   19.0   \n",
       "1              AAA             2013J           1752             TMA   19.0   \n",
       "2              AAA             2013J           1752             TMA   19.0   \n",
       "3              AAA             2013J           1752             TMA   19.0   \n",
       "4              AAA             2013J           1752             TMA   19.0   \n",
       "...            ...               ...            ...             ...    ...   \n",
       "173907         GGG             2014J          37437             TMA  173.0   \n",
       "173908         GGG             2014J          37437             TMA  173.0   \n",
       "173909         GGG             2014J          37437             TMA  173.0   \n",
       "173910         GGG             2014J          37437             TMA  173.0   \n",
       "173911         GGG             2014J          37437             TMA  173.0   \n",
       "\n",
       "        weight  module_presentation_length  id_student  date_registration  \\\n",
       "0         10.0                         268       11391             -159.0   \n",
       "1         10.0                         268       28400              -53.0   \n",
       "2         10.0                         268       31604              -52.0   \n",
       "3         10.0                         268       32885             -176.0   \n",
       "4         10.0                         268       38053             -110.0   \n",
       "...        ...                         ...         ...                ...   \n",
       "173907     0.0                         269     2606765              -64.0   \n",
       "173908     0.0                         269     2620947              -23.0   \n",
       "173909     0.0                         269     2645731              -23.0   \n",
       "173910     0.0                         269     2648187             -129.0   \n",
       "173911     0.0                         269     2684003              -28.0   \n",
       "\n",
       "        date_unregistration  date_submitted  is_banked  score  \n",
       "0                       NaN              18          0   78.0  \n",
       "1                       NaN              22          0   70.0  \n",
       "2                       NaN              17          0   72.0  \n",
       "3                       NaN              26          0   69.0  \n",
       "4                       NaN              19          0   79.0  \n",
       "...                     ...             ...        ...    ...  \n",
       "173907                  NaN             185          0   55.0  \n",
       "173908                  NaN             179          0   75.0  \n",
       "173909                  NaN             177          0   72.0  \n",
       "173910                  NaN             172          0   70.0  \n",
       "173911                  NaN             169          0   60.0  \n",
       "\n",
       "[173912 rows x 13 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_data # получаем общую таблицу"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Считаю, что данные подготовлены для дальнейшей работы."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Задание 1. Сколько студентов успешно сдали только один курс? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Для того, чтобы выполнить задание, необходимо понять, что в данном случае является курсом. \n",
    "\n",
    "В данных фигурирует формулировка, что **\"Успешная сдача — это зачёт по курсу на экзамене\"**. Следовательно, нам необходимо отфильтровать колонку **assessment_type** и выбрать экзамен.\n",
    "А также необходимо учесть, что успешная сдача это **>= 40** набранных баллов.\n",
    "\n",
    "Попробуем сгруппировать по предметам и семестрам выборку из студентов, которые набрали за экзамен >= 40 баллов."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>id_student</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2013B</td>\n",
       "      <td>504</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2013J</td>\n",
       "      <td>878</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CCC</td>\n",
       "      <td>2014B</td>\n",
       "      <td>664</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2014B</td>\n",
       "      <td>485</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CCC</td>\n",
       "      <td>2014J</td>\n",
       "      <td>1019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2014J</td>\n",
       "      <td>842</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module code_presentation  id_student\n",
       "2         DDD             2013B         504\n",
       "3         DDD             2013J         878\n",
       "0         CCC             2014B         664\n",
       "4         DDD             2014B         485\n",
       "1         CCC             2014J        1019\n",
       "5         DDD             2014J         842"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "success_exam = all_data.query('assessment_type == \"Exam\" & score >= 40') \\\n",
    "                        .groupby(['code_module', 'code_presentation'], as_index=False) \\\n",
    "                        .agg({'id_student':'count'}) \\\n",
    "                        .sort_values('code_presentation')\n",
    "success_exam"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Предположу, что одним курсом в данном случае является семестр (code_presentation). Тогда, получается, что у нас имеется **4** курса (2013B, 2013J, 2014B, 2014J).\n",
    "\n",
    "Возможно, что для завершения курсов **2014B** и **2014J**, можно было сдать один экзамен по предмету из предложенных, либо **CCC**, либо **DDD**. В то время, как для завершения **2013B** и **2013J** достаточно было сдать предмет **DDD**. \n",
    "\n",
    "Но что, если оба предмета были необходимы для завершения курса?\n",
    "Проверяем по времени длительности модуля."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>module_presentation_length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2013B</td>\n",
       "      <td>240</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2013J</td>\n",
       "      <td>261</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CCC</td>\n",
       "      <td>2014B</td>\n",
       "      <td>241</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2014B</td>\n",
       "      <td>241</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CCC</td>\n",
       "      <td>2014J</td>\n",
       "      <td>269</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2014J</td>\n",
       "      <td>262</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module code_presentation  module_presentation_length\n",
       "2         DDD             2013B                         240\n",
       "3         DDD             2013J                         261\n",
       "0         CCC             2014B                         241\n",
       "4         DDD             2014B                         241\n",
       "1         CCC             2014J                         269\n",
       "5         DDD             2014J                         262"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "success_exam_timing = all_data.query('assessment_type == \"Exam\" & score >= 40') \\\n",
    "                        .groupby(['code_module', 'code_presentation'], as_index=False) \\\n",
    "                        .agg({'module_presentation_length':'mean'}) \\\n",
    "                        .sort_values('code_presentation')\n",
    "success_exam_timing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Курсы 2014B и 2014J длятся в среднем 250+ дней, поэтому предположение о том, что оба предмета необходимы для сдачи - неверное. При совмещении и сдаче двух экзаменов получается практически год обучения, что не сходится с продолжительностью курсов 2013B и 2013J c одним экзаменом.\n",
    "\n",
    "\n",
    "**Выбираю за основу то, что для завершения курсов 2014B и 2014J, можно было сдать один экзамен по предмету из предложенных.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Теперь, для решения задания необходимо сгруппировать офильтрованные данные по студентам и посчитать, сколько из них успешно сдали один курс:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3802"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "success_exam_1mod = all_data.query('assessment_type == \"Exam\" & score >= 40') \\\n",
    "                        .groupby('id_student', as_index=False) \\\n",
    "                        .agg({'code_module':'count'})\n",
    "success_exam_1mod.query('code_module == 1').shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Ответ:** 3802 студента успешно сдали только один курс."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Задание 2. Выяви самый сложный и самый простой экзамен: найди курсы и экзамены в рамках курса, которые обладают самой низкой и самой высокой завершаемостью*\n",
    "\n",
    "*завершаемость = кол-во успешных экзаменов / кол-во всех попыток сдать экзамен\n",
    "\n",
    "\n",
    "Для того, чтобы понять, насколько прост или сложен тот или иной экзамен в рамках определенного курса, мне необходимо посчитать, сколько людей сдало экзамен успешно и сколько всего было попыток сдать экзамен."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 1. Посчитаем, сколько студентов успешно сдали экзамены, добавим идентификационный номер теста, чтобы видеть, какой оказался сложнее.**\n",
    "\n",
    "Фильтруем данные по успешным случаям сдачи, группируем по номеру теста, экзамену и курсу."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_assessment</th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>successful</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>25340</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2013B</td>\n",
       "      <td>504</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>25354</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2013J</td>\n",
       "      <td>878</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>24290</td>\n",
       "      <td>CCC</td>\n",
       "      <td>2014B</td>\n",
       "      <td>664</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>25361</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2014B</td>\n",
       "      <td>485</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24299</td>\n",
       "      <td>CCC</td>\n",
       "      <td>2014J</td>\n",
       "      <td>1019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>25368</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2014J</td>\n",
       "      <td>842</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id_assessment code_module code_presentation  successful\n",
       "2          25340         DDD             2013B         504\n",
       "3          25354         DDD             2013J         878\n",
       "0          24290         CCC             2014B         664\n",
       "4          25361         DDD             2014B         485\n",
       "1          24299         CCC             2014J        1019\n",
       "5          25368         DDD             2014J         842"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "succ_exams = all_data.query('assessment_type == \"Exam\" & score >= 40') \\\n",
    "                     .groupby(['id_assessment', 'code_module', 'code_presentation'], as_index=False) \\\n",
    "                     .agg({'id_student':'count'}).rename(columns={'id_student': 'successful'}) \\\n",
    "                     .sort_values('code_presentation')\n",
    "succ_exams"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 2. Посчитаем, сколько студентов вообще сдавали экзамены, включая неудачные попытки.**\n",
    "\n",
    "Фильтруем данные по всем экзаменационным сдачам, группируем по номеру теста, экзамену и курсу."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_assessment</th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>all_attempts</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>25340</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2013B</td>\n",
       "      <td>602</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>25354</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2013J</td>\n",
       "      <td>968</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>24290</td>\n",
       "      <td>CCC</td>\n",
       "      <td>2014B</td>\n",
       "      <td>747</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>25361</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2014B</td>\n",
       "      <td>524</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24299</td>\n",
       "      <td>CCC</td>\n",
       "      <td>2014J</td>\n",
       "      <td>1168</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>25368</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2014J</td>\n",
       "      <td>950</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id_assessment code_module code_presentation  all_attempts\n",
       "2          25340         DDD             2013B           602\n",
       "3          25354         DDD             2013J           968\n",
       "0          24290         CCC             2014B           747\n",
       "4          25361         DDD             2014B           524\n",
       "1          24299         CCC             2014J          1168\n",
       "5          25368         DDD             2014J           950"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_exams = all_data.query('assessment_type == \"Exam\"') \\\n",
    "                     .groupby(['id_assessment', 'code_module', 'code_presentation'], as_index=False) \\\n",
    "                     .agg({'id_student':'count'}).rename(columns={'id_student': 'all_attempts'}) \\\n",
    "                     .sort_values('code_presentation')\n",
    "all_exams"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 3. Примёрджим таблицу с успешными сдачами к таблице со всеми сдачами по ключам с помощью merge.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_assessment</th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>successful</th>\n",
       "      <th>all_attempts</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>25340</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2013B</td>\n",
       "      <td>504</td>\n",
       "      <td>602</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>25354</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2013J</td>\n",
       "      <td>878</td>\n",
       "      <td>968</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>24290</td>\n",
       "      <td>CCC</td>\n",
       "      <td>2014B</td>\n",
       "      <td>664</td>\n",
       "      <td>747</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>25361</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2014B</td>\n",
       "      <td>485</td>\n",
       "      <td>524</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24299</td>\n",
       "      <td>CCC</td>\n",
       "      <td>2014J</td>\n",
       "      <td>1019</td>\n",
       "      <td>1168</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>25368</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2014J</td>\n",
       "      <td>842</td>\n",
       "      <td>950</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id_assessment code_module code_presentation  successful  all_attempts\n",
       "0          25340         DDD             2013B         504           602\n",
       "1          25354         DDD             2013J         878           968\n",
       "2          24290         CCC             2014B         664           747\n",
       "3          25361         DDD             2014B         485           524\n",
       "4          24299         CCC             2014J        1019          1168\n",
       "5          25368         DDD             2014J         842           950"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "attempts = succ_exams.merge(all_exams, on=['id_assessment', 'code_module', 'code_presentation'])\n",
    "attempts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 4. Мы помним, что завершаемость = кол-во успешных экзаменов / кол-во всех попыток сдать экзамен. Считаем!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_assessment</th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>successful</th>\n",
       "      <th>all_attempts</th>\n",
       "      <th>completeness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>25361</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2014B</td>\n",
       "      <td>485</td>\n",
       "      <td>524</td>\n",
       "      <td>93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>25354</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2013J</td>\n",
       "      <td>878</td>\n",
       "      <td>968</td>\n",
       "      <td>91.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>24290</td>\n",
       "      <td>CCC</td>\n",
       "      <td>2014B</td>\n",
       "      <td>664</td>\n",
       "      <td>747</td>\n",
       "      <td>89.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>25368</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2014J</td>\n",
       "      <td>842</td>\n",
       "      <td>950</td>\n",
       "      <td>89.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24299</td>\n",
       "      <td>CCC</td>\n",
       "      <td>2014J</td>\n",
       "      <td>1019</td>\n",
       "      <td>1168</td>\n",
       "      <td>87.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>25340</td>\n",
       "      <td>DDD</td>\n",
       "      <td>2013B</td>\n",
       "      <td>504</td>\n",
       "      <td>602</td>\n",
       "      <td>84.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id_assessment code_module code_presentation  successful  all_attempts  \\\n",
       "3          25361         DDD             2014B         485           524   \n",
       "1          25354         DDD             2013J         878           968   \n",
       "2          24290         CCC             2014B         664           747   \n",
       "5          25368         DDD             2014J         842           950   \n",
       "4          24299         CCC             2014J        1019          1168   \n",
       "0          25340         DDD             2013B         504           602   \n",
       "\n",
       "   completeness  \n",
       "3          93.0  \n",
       "1          91.0  \n",
       "2          89.0  \n",
       "5          89.0  \n",
       "4          87.0  \n",
       "0          84.0  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "attempts['completeness'] = (attempts.successful / attempts.all_attempts).round(2) * 100\n",
    "attempts.sort_values('completeness', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Ответ:**\n",
    "\n",
    "На выходе получаем колонку completeness с процентом завершаемости.\n",
    "Из неё делаем вывод о том, что:\n",
    "\n",
    "- самый сложный экзамен с **id = 25340 (DDD), курс 2013B** (завершаемость = 84 %)\n",
    "- самый простой экзамен с **id = 25361 (DDD), курс 2014B** (завершаемость = 93 %)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Задание 3. По каждому предмету определи средний срок сдачи экзаменов (под сдачей понимаем последнее успешное прохождение экзамена студентом)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Для выполнения задания необходимо обратить внимание на колонку **date_submitted** (дата сдачи теста студентом), которая измеряется как количество дней с начала семестра. Мы знаем, что семестр длится в среднем 250+ дней.\n",
    "\n",
    "Здесь мне снова пригодится отфильтрованный по успешности сдачи экзамена датафрейм.\n",
    "Посчитаем среднее количество дней для сдачи экзаменов с помощью **mean**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>date_submitted</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CCC</td>\n",
       "      <td>239.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>DDD</td>\n",
       "      <td>238.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module  date_submitted\n",
       "0         CCC           239.0\n",
       "1         DDD           238.0"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_date_submitted = all_data.query('assessment_type == \"Exam\" & score >= 40') \\\n",
    "                        .groupby('code_module', as_index=False) \\\n",
    "                        .agg({'date_submitted':'mean'})\n",
    "\n",
    "mean_date_submitted.round()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Ответ:**\n",
    "- средний срок сдачи экзамена по предмету **ССС** - **239 дней**.\n",
    "- средний срок сдачи экзамена по предмету **DDD** - **238 дней**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Задание 4. Выяви самые популярные предметы (ТОП-3) по количеству регистраций на них. А также предметы с самым большим оттоком (ТОП-3).\n",
    "\n",
    "Воспользуюсь тем же самым способом поэтапного распила и мёрджа как и во втором задании.\n",
    "Не забуду учесть, что в колонке **date_registration** имеются отсутствующие значения. Точно так же не забуду учесть, что в колонке **date_unregistration** имеются пустые значения, которые сигнализируют о том, что студент окончил курс. Всё остальное будем считать за отток. \n",
    "\n",
    "Всё необходимое нам находится в оригинальном датафрейме **studReg**. Воспользуюсь им, так как при использовании **all_data** были расхождения в расчетах. \n",
    "\n",
    "(Количество регистраций в **studReg** отличается от **all_data** - их больше и выглядит так, будто в **all_data** я что-то потеряла при мёрдже).\n",
    "\n",
    "В качестве \"количества регистраций\", будем считать уникальных студентов."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 1. Количество регистраций.**\n",
    "\n",
    "Фильтруем данные и не берем в учет отсутствующие значения. Группируем по предмету и считаем уникальных студентов, которые зарегистрированы."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>registration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAA</td>\n",
       "      <td>712</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BBB</td>\n",
       "      <td>7683</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CCC</td>\n",
       "      <td>4244</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DDD</td>\n",
       "      <td>5840</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>EEE</td>\n",
       "      <td>2858</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>FFF</td>\n",
       "      <td>7387</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>GGG</td>\n",
       "      <td>2525</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module  registration\n",
       "0         AAA           712\n",
       "1         BBB          7683\n",
       "2         CCC          4244\n",
       "3         DDD          5840\n",
       "4         EEE          2858\n",
       "5         FFF          7387\n",
       "6         GGG          2525"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "registration_count = studReg.query('date_registration != \"NaN\"') \\\n",
    "                            .groupby('code_module', as_index = False) \\\n",
    "                            .agg({'id_student':'nunique'}).rename(columns={'id_student':'registration'})\n",
    "registration_count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 2. Подсчёт оттока.**\n",
    "\n",
    "По аналогии с подсчетом количества регистраций."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>unregistration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAA</td>\n",
       "      <td>116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BBB</td>\n",
       "      <td>2314</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CCC</td>\n",
       "      <td>1858</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2065</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>EEE</td>\n",
       "      <td>693</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>FFF</td>\n",
       "      <td>2249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>GGG</td>\n",
       "      <td>287</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module  unregistration\n",
       "0         AAA             116\n",
       "1         BBB            2314\n",
       "2         CCC            1858\n",
       "3         DDD            2065\n",
       "4         EEE             693\n",
       "5         FFF            2249\n",
       "6         GGG             287"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unregistration_count = studReg.query('date_unregistration != \"NaN\"') \\\n",
    "                            .groupby('code_module', as_index = False) \\\n",
    "                            .agg({'id_student':'nunique'}).rename(columns={'id_student':'unregistration'})\n",
    "unregistration_count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 3. Примёрджим таблицу с количеством регистраций к таблице с количеством оттока с помощью merge по ключу.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>registration</th>\n",
       "      <th>unregistration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAA</td>\n",
       "      <td>712</td>\n",
       "      <td>116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BBB</td>\n",
       "      <td>7683</td>\n",
       "      <td>2314</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CCC</td>\n",
       "      <td>4244</td>\n",
       "      <td>1858</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DDD</td>\n",
       "      <td>5840</td>\n",
       "      <td>2065</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>EEE</td>\n",
       "      <td>2858</td>\n",
       "      <td>693</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>FFF</td>\n",
       "      <td>7387</td>\n",
       "      <td>2249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>GGG</td>\n",
       "      <td>2525</td>\n",
       "      <td>287</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module  registration  unregistration\n",
       "0         AAA           712             116\n",
       "1         BBB          7683            2314\n",
       "2         CCC          4244            1858\n",
       "3         DDD          5840            2065\n",
       "4         EEE          2858             693\n",
       "5         FFF          7387            2249\n",
       "6         GGG          2525             287"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reg_unreg = registration_count.merge(unregistration_count, on='code_module')\n",
    "reg_unreg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 4. Подсчитаем процент оттока: unregistration / registration.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>registration</th>\n",
       "      <th>unregistration</th>\n",
       "      <th>outflow</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BBB</td>\n",
       "      <td>7683</td>\n",
       "      <td>2314</td>\n",
       "      <td>30.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>FFF</td>\n",
       "      <td>7387</td>\n",
       "      <td>2249</td>\n",
       "      <td>30.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DDD</td>\n",
       "      <td>5840</td>\n",
       "      <td>2065</td>\n",
       "      <td>35.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CCC</td>\n",
       "      <td>4244</td>\n",
       "      <td>1858</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>EEE</td>\n",
       "      <td>2858</td>\n",
       "      <td>693</td>\n",
       "      <td>24.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>GGG</td>\n",
       "      <td>2525</td>\n",
       "      <td>287</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAA</td>\n",
       "      <td>712</td>\n",
       "      <td>116</td>\n",
       "      <td>16.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module  registration  unregistration  outflow\n",
       "1         BBB          7683            2314     30.0\n",
       "5         FFF          7387            2249     30.0\n",
       "3         DDD          5840            2065     35.0\n",
       "2         CCC          4244            1858     44.0\n",
       "4         EEE          2858             693     24.0\n",
       "6         GGG          2525             287     11.0\n",
       "0         AAA           712             116     16.0"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reg_unreg['outflow'] = (reg_unreg.unregistration / reg_unreg.registration).round(2) * 100\n",
    "reg_unreg.sort_values('registration', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Ответ 1:**\n",
    "\n",
    "Из данной таблицы делаем вывод по **ТОП-3** предметов по количеству регистраций: \n",
    "\n",
    "1) **BBB**\n",
    "\n",
    "2) **FFF**\n",
    "\n",
    "3) **DDD**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>registration</th>\n",
       "      <th>unregistration</th>\n",
       "      <th>outflow</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CCC</td>\n",
       "      <td>4244</td>\n",
       "      <td>1858</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DDD</td>\n",
       "      <td>5840</td>\n",
       "      <td>2065</td>\n",
       "      <td>35.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BBB</td>\n",
       "      <td>7683</td>\n",
       "      <td>2314</td>\n",
       "      <td>30.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>FFF</td>\n",
       "      <td>7387</td>\n",
       "      <td>2249</td>\n",
       "      <td>30.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>EEE</td>\n",
       "      <td>2858</td>\n",
       "      <td>693</td>\n",
       "      <td>24.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAA</td>\n",
       "      <td>712</td>\n",
       "      <td>116</td>\n",
       "      <td>16.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>GGG</td>\n",
       "      <td>2525</td>\n",
       "      <td>287</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module  registration  unregistration  outflow\n",
       "2         CCC          4244            1858     44.0\n",
       "3         DDD          5840            2065     35.0\n",
       "1         BBB          7683            2314     30.0\n",
       "5         FFF          7387            2249     30.0\n",
       "4         EEE          2858             693     24.0\n",
       "0         AAA           712             116     16.0\n",
       "6         GGG          2525             287     11.0"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reg_unreg.sort_values('outflow', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Ответ 2:**\n",
    "\n",
    "Из данной таблицы делаем вывод по **ТОП-3 (4 :D)** предметов по оттоку:\n",
    "\n",
    "1) **CCC***, **DDD**\n",
    "\n",
    "2) **BBB**\n",
    "\n",
    "3) **FFF**\n",
    "\n",
    "*Смущает CCC - не лидер по количеству оттока. Только по процентному соотношению!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Задание 5. Используя pandas, в период с начала 2013 по конец 2014 выяви семестр с самой низкой завершаемостью курсов и самыми долгими средними сроками сдачи курсов. \n",
    "* а разве можно было не pandas? :D\n",
    "\n",
    "Самый простой вариант решения данного задания снова заключается в поэтапном распиле. Начнем с самых долгих средних сроков сдачи курсов."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 1. Выявление самого долгого среднего срока сдачи курса.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>date_submitted</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2014J</td>\n",
       "      <td>243.680279</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2014B</td>\n",
       "      <td>232.997389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2013J</td>\n",
       "      <td>239.509112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2013B</td>\n",
       "      <td>230.164683</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_presentation  date_submitted\n",
       "3             2014J      243.680279\n",
       "2             2014B      232.997389\n",
       "1             2013J      239.509112\n",
       "0             2013B      230.164683"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "long_time_mean = all_data.query('assessment_type == \"Exam\" & score >= 40') \\\n",
    "                         .groupby('code_presentation', as_index=False) \\\n",
    "                         .agg({'date_submitted':'mean'}) \\\n",
    "                         .sort_values('code_presentation', ascending=False)\n",
    "long_time_mean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Делаем вывод из данной таблицы, что самый долгий семестр по среднему сроку сдачи курса - 2014J (244 дня)**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 2. Выявление семестра с самой низкой завершаемостью курсов.**\n",
    "\n",
    "Похожее мы делали на этапе с заданием 2. Поэтому выведу датафрейм сразу с результатами."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code_module</th>\n",
       "      <th>code_presentation</th>\n",
       "      <th>successful</th>\n",
       "      <th>all_attempts</th>\n",
       "      <th>completeness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2014B</td>\n",
       "      <td>485</td>\n",
       "      <td>524</td>\n",
       "      <td>93.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2013J</td>\n",
       "      <td>878</td>\n",
       "      <td>968</td>\n",
       "      <td>91.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CCC</td>\n",
       "      <td>2014B</td>\n",
       "      <td>664</td>\n",
       "      <td>747</td>\n",
       "      <td>89.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2014J</td>\n",
       "      <td>842</td>\n",
       "      <td>950</td>\n",
       "      <td>89.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>CCC</td>\n",
       "      <td>2014J</td>\n",
       "      <td>1019</td>\n",
       "      <td>1168</td>\n",
       "      <td>87.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>DDD</td>\n",
       "      <td>2013B</td>\n",
       "      <td>504</td>\n",
       "      <td>602</td>\n",
       "      <td>84.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  code_module code_presentation  successful  all_attempts  completeness\n",
       "3         DDD             2014B         485           524          93.0\n",
       "1         DDD             2013J         878           968          91.0\n",
       "2         CCC             2014B         664           747          89.0\n",
       "5         DDD             2014J         842           950          89.0\n",
       "4         CCC             2014J        1019          1168          87.0\n",
       "0         DDD             2013B         504           602          84.0"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Успешные экзамены\n",
    "succ_exams1 = all_data.query('assessment_type == \"Exam\" & score >= 40') \\\n",
    "                     .groupby(['code_module', 'code_presentation'], as_index=False) \\\n",
    "                     .agg({'id_student':'count'}).rename(columns={'id_student': 'successful'}) \\\n",
    "                     .sort_values('code_presentation')\n",
    "\n",
    "# Все экзамены\n",
    "all_exams1 = all_data.query('assessment_type == \"Exam\"') \\\n",
    "                     .groupby(['code_module', 'code_presentation'], as_index=False) \\\n",
    "                     .agg({'id_student':'count'}).rename(columns={'id_student': 'all_attempts'}) \\\n",
    "                     .sort_values('code_presentation')\n",
    "\n",
    "# Объединение таблиц\n",
    "attempts1 = succ_exams1.merge(all_exams1, on=['code_module', 'code_presentation'])\n",
    "attempts1\n",
    "\n",
    "# Завершаемость = кол-во успешных экзаменов / кол-во всех попыток сдать экзамен\n",
    "attempts1['completeness'] = (attempts1.successful / attempts1.all_attempts).round(2) * 100\n",
    "attempts1.sort_values('completeness', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Делаем вывод из данной таблицы, что семестр с самой низкой завершаемостью курсов - 2013B (84 %)**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Ответ:**\n",
    "- семестр с самой низкой завершаемостью курсов - **2013B**.\n",
    "- семестр с самыми долгими средними сроками сдачи курсов - **2014J**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_______________________________________________________________________________________________________________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Задание 6. Часто для качественного анализа аудитории используют подходы, основанные на сегментации. Используя python, построй адаптированные RFM-кластеры студентов, чтобы качественно оценить свою аудиторию. \n",
    "\n",
    "В адаптированной кластеризации можно выбрать следующие метрики: \n",
    "\n",
    "R - среднее время сдачи одного экзамена, \n",
    "\n",
    "F - завершаемость курсов, \n",
    "\n",
    "M - среднее количество баллов, получаемое за экзамен. \n",
    "\n",
    "Для анализа берем изначальную табличку **all_data**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 1. Считаем R - среднее время сдачи одного экзамена для каждого студента**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_student</th>\n",
       "      <th>mean_date_submitted</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>23698</td>\n",
       "      <td>243.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24213</td>\n",
       "      <td>236.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27116</td>\n",
       "      <td>243.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>28046</td>\n",
       "      <td>237.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28787</td>\n",
       "      <td>243.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4092</th>\n",
       "      <td>2694886</td>\n",
       "      <td>236.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4093</th>\n",
       "      <td>2694933</td>\n",
       "      <td>230.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4094</th>\n",
       "      <td>2695608</td>\n",
       "      <td>237.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4095</th>\n",
       "      <td>2697181</td>\n",
       "      <td>230.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4096</th>\n",
       "      <td>2698251</td>\n",
       "      <td>234.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4097 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      id_student  mean_date_submitted\n",
       "0          23698                243.0\n",
       "1          24213                236.0\n",
       "2          27116                243.0\n",
       "3          28046                237.0\n",
       "4          28787                243.0\n",
       "...          ...                  ...\n",
       "4092     2694886                236.0\n",
       "4093     2694933                230.0\n",
       "4094     2695608                237.0\n",
       "4095     2697181                230.0\n",
       "4096     2698251                234.0\n",
       "\n",
       "[4097 rows x 2 columns]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "R = all_data.query('assessment_type == \"Exam\" & score >= 40') \\\n",
    "                        .groupby('id_student', as_index=False) \\\n",
    "                        .agg({'date_submitted':'mean'}) \\\n",
    "                        .rename(columns={'date_submitted':'mean_date_submitted'})\n",
    "R"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False    4097\n",
       "Name: mean_date_submitted, dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Проверяем отсутствующие значения\n",
    "R.mean_date_submitted.isna().value_counts()\n",
    "# Отсутствуют"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 2. Считаем F - считаем завершаемость курсов**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Успешные экзамены каждого студента\n",
    "f_exc = all_data.query('assessment_type == \"Exam\" & score >= 40') \\\n",
    "                     .groupby('id_student', as_index=False) \\\n",
    "                     .agg({'score':'count'}).rename(columns={'score':'succ_try'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Все экзамены\n",
    "f_all = all_data.query('assessment_type == \"Exam\"') \\\n",
    "                     .groupby('id_student', as_index=False) \\\n",
    "                     .agg({'score':'count'}).rename(columns={'score':'all_try'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Объединение таблиц\n",
    "F = f_all.merge(f_exc, how='left', on='id_student')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_student</th>\n",
       "      <th>all_try</th>\n",
       "      <th>succ_try</th>\n",
       "      <th>completeness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>23698</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24213</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27116</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>28046</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28787</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4628</th>\n",
       "      <td>2694886</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4629</th>\n",
       "      <td>2694933</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4630</th>\n",
       "      <td>2695608</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4631</th>\n",
       "      <td>2697181</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4632</th>\n",
       "      <td>2698251</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4633 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      id_student  all_try  succ_try  completeness\n",
       "0          23698        1       1.0         100.0\n",
       "1          24213        1       1.0         100.0\n",
       "2          27116        1       1.0         100.0\n",
       "3          28046        1       1.0         100.0\n",
       "4          28787        1       1.0         100.0\n",
       "...          ...      ...       ...           ...\n",
       "4628     2694886        1       1.0         100.0\n",
       "4629     2694933        1       1.0         100.0\n",
       "4630     2695608        1       1.0         100.0\n",
       "4631     2697181        1       1.0         100.0\n",
       "4632     2698251        1       1.0         100.0\n",
       "\n",
       "[4633 rows x 4 columns]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Завершаемость = кол-во успешных экзаменов / кол-во всех попыток сдать экзамен\n",
    "F['completeness'] = (F.succ_try / F.all_try) * 100\n",
    "F"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False    4097\n",
       "True      536\n",
       "Name: completeness, dtype: int64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Проверяем на наличие NaN\n",
    "F.completeness.isna().value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Терять данные значения было бы нехорошо, поэтому заполним отсутствующие значения 0. И отбрасываем ненужные колонки all_try, succ_try\n",
    "F = F.fillna(0)\n",
    "F = F.drop(['all_try', 'succ_try'], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_student</th>\n",
       "      <th>completeness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>23698</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24213</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27116</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>28046</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28787</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4628</th>\n",
       "      <td>2694886</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4629</th>\n",
       "      <td>2694933</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4630</th>\n",
       "      <td>2695608</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4631</th>\n",
       "      <td>2697181</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4632</th>\n",
       "      <td>2698251</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4633 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      id_student  completeness\n",
       "0          23698         100.0\n",
       "1          24213         100.0\n",
       "2          27116         100.0\n",
       "3          28046         100.0\n",
       "4          28787         100.0\n",
       "...          ...           ...\n",
       "4628     2694886         100.0\n",
       "4629     2694933         100.0\n",
       "4630     2695608         100.0\n",
       "4631     2697181         100.0\n",
       "4632     2698251         100.0\n",
       "\n",
       "[4633 rows x 2 columns]"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "F"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 3. Считаем M - считаем среднее количество баллов, получаемое за экзамен**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_student</th>\n",
       "      <th>mean_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>23698</td>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24213</td>\n",
       "      <td>58.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27116</td>\n",
       "      <td>96.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>28046</td>\n",
       "      <td>40.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28787</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4628</th>\n",
       "      <td>2694886</td>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4629</th>\n",
       "      <td>2694933</td>\n",
       "      <td>73.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4630</th>\n",
       "      <td>2695608</td>\n",
       "      <td>73.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4631</th>\n",
       "      <td>2697181</td>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4632</th>\n",
       "      <td>2698251</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4633 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      id_student  mean_score\n",
       "0          23698        80.0\n",
       "1          24213        58.0\n",
       "2          27116        96.0\n",
       "3          28046        40.0\n",
       "4          28787        44.0\n",
       "...          ...         ...\n",
       "4628     2694886        69.0\n",
       "4629     2694933        73.0\n",
       "4630     2695608        73.0\n",
       "4631     2697181        80.0\n",
       "4632     2698251        44.0\n",
       "\n",
       "[4633 rows x 2 columns]"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "M = all_data.query('assessment_type == \"Exam\"') \\\n",
    "                     .groupby('id_student', as_index=False) \\\n",
    "                     .agg({'score':'mean'}).rename(columns={'score':'mean_score'})\n",
    "M"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False    4633\n",
       "Name: mean_score, dtype: int64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "M.mean_score.isna().value_counts()\n",
    "# Пропущенные значения отсутствуют"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 4. Создаем категории для R, F и М**\n",
    "\n",
    "По изученным урокам понимаю, что в разделении на категории нам поможет функция **pd.cut()**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    4097.000000\n",
       "mean      238.466317\n",
       "std         5.632457\n",
       "min       229.000000\n",
       "25%       234.000000\n",
       "50%       241.000000\n",
       "75%       243.000000\n",
       "max       285.000000\n",
       "Name: mean_date_submitted, dtype: float64"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# для определения промежутков R для наших значений будем использовать квантили, их можно посмотреть с помощью ф-ции describe\n",
    "R.mean_date_submitted.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([100.,   0.,  50.])"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# для определения промежутков F для наших значений будем использовать промежутки завершаемости.\n",
    "F.completeness.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    4633.000000\n",
       "mean       65.117958\n",
       "std        20.470561\n",
       "min         0.000000\n",
       "25%        50.000000\n",
       "50%        66.000000\n",
       "75%        82.000000\n",
       "max       100.000000\n",
       "Name: mean_score, dtype: float64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# для определения промежутков M для наших значений будем использовать квантили, их можно посмотреть с помощью ф-ции describe\n",
    "M.mean_score.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Перед тем, как продолжить, объединим имеющиеся R, F, M**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_student</th>\n",
       "      <th>mean_score</th>\n",
       "      <th>completeness</th>\n",
       "      <th>mean_date_submitted</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>23698</td>\n",
       "      <td>80.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>243.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24213</td>\n",
       "      <td>58.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>236.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27116</td>\n",
       "      <td>96.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>243.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>28046</td>\n",
       "      <td>40.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>237.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28787</td>\n",
       "      <td>44.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>243.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4628</th>\n",
       "      <td>2694886</td>\n",
       "      <td>69.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>236.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4629</th>\n",
       "      <td>2694933</td>\n",
       "      <td>73.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>230.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4630</th>\n",
       "      <td>2695608</td>\n",
       "      <td>73.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>237.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4631</th>\n",
       "      <td>2697181</td>\n",
       "      <td>80.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>230.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4632</th>\n",
       "      <td>2698251</td>\n",
       "      <td>44.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>234.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4633 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      id_student  mean_score  completeness  mean_date_submitted\n",
       "0          23698        80.0         100.0                243.0\n",
       "1          24213        58.0         100.0                236.0\n",
       "2          27116        96.0         100.0                243.0\n",
       "3          28046        40.0         100.0                237.0\n",
       "4          28787        44.0         100.0                243.0\n",
       "...          ...         ...           ...                  ...\n",
       "4628     2694886        69.0         100.0                236.0\n",
       "4629     2694933        73.0         100.0                230.0\n",
       "4630     2695608        73.0         100.0                237.0\n",
       "4631     2697181        80.0         100.0                230.0\n",
       "4632     2698251        44.0         100.0                234.0\n",
       "\n",
       "[4633 rows x 4 columns]"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RFM = M.merge(F, how='left', on='id_student').merge(R, how='left', on='id_student')\n",
    "RFM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 5. Разделяем на категории и присваиваем лейблы.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "RFM['R'] = pd.cut(RFM.mean_date_submitted,[0, 229.000000, 234.000000, 241.000000, 243.000000], labels=[1, 2, 3, 4])\n",
    "RFM['F'] = pd.cut(RFM.completeness,[0, 50.0, 100.0], labels=[1, 2])\n",
    "RFM['M'] = pd.cut(RFM.mean_score,[0, 50.000000, 66.000000, 82.000000, 100.000000], labels=[1, 2, 3, 4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_student</th>\n",
       "      <th>R</th>\n",
       "      <th>F</th>\n",
       "      <th>M</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>23698</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24213</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27116</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>28046</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28787</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4628</th>\n",
       "      <td>2694886</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4629</th>\n",
       "      <td>2694933</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4630</th>\n",
       "      <td>2695608</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4631</th>\n",
       "      <td>2697181</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4632</th>\n",
       "      <td>2698251</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4633 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      id_student  R  F  M\n",
       "0          23698  4  2  3\n",
       "1          24213  3  2  2\n",
       "2          27116  4  2  4\n",
       "3          28046  3  2  1\n",
       "4          28787  4  2  1\n",
       "...          ... .. .. ..\n",
       "4628     2694886  3  2  3\n",
       "4629     2694933  2  2  3\n",
       "4630     2695608  3  2  3\n",
       "4631     2697181  2  2  3\n",
       "4632     2698251  2  2  1\n",
       "\n",
       "[4633 rows x 4 columns]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RFM = RFM[['id_student', 'R', 'F', 'M']]\n",
    "RFM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "id_student  R      F      M    \n",
       "False       False  False  False    3275\n",
       "            True   False  False     822\n",
       "                   True   False     533\n",
       "                          True        3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Наблюдаем отсутствующие значения, заполняем нулями.\n",
    "RFM.isna().value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "id_student       int64\n",
       "R             category\n",
       "F             category\n",
       "M             category\n",
       "dtype: object"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Заполнить сразу не можем, потому что нам необходимо поменять тип данных.\n",
    "RFM.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Сложно, но нашелся вот такой чит! Думаю, что использование pd.cut() было не самым удобным решением. Решение в статье показалось тяжелее\n",
    "# Оцениваю решение как \"из под палки\"\n",
    "RFM[['R', 'F', 'M']] = RFM[['R', 'F', 'M']].astype('float').astype('Int8').fillna(0)\n",
    "RFM[['R', 'F', 'M']] = RFM[['R', 'F', 'M']].astype('int64')\n",
    "pd.options.mode.chained_assignment = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "id_student    int64\n",
       "R             int64\n",
       "F             int64\n",
       "M             int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RFM.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 6. Собираем колонку RFM для оценки кластеров**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_student</th>\n",
       "      <th>R</th>\n",
       "      <th>F</th>\n",
       "      <th>M</th>\n",
       "      <th>RFM</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>23698</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>423</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24213</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>322</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27116</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>424</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>28046</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>321</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28787</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4628</th>\n",
       "      <td>2694886</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>323</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4629</th>\n",
       "      <td>2694933</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>223</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4630</th>\n",
       "      <td>2695608</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>323</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4631</th>\n",
       "      <td>2697181</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>223</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4632</th>\n",
       "      <td>2698251</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>221</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4633 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      id_student  R  F  M  RFM\n",
       "0          23698  4  2  3  423\n",
       "1          24213  3  2  2  322\n",
       "2          27116  4  2  4  424\n",
       "3          28046  3  2  1  321\n",
       "4          28787  4  2  1  421\n",
       "...          ... .. .. ..  ...\n",
       "4628     2694886  3  2  3  323\n",
       "4629     2694933  2  2  3  223\n",
       "4630     2695608  3  2  3  323\n",
       "4631     2697181  2  2  3  223\n",
       "4632     2698251  2  2  1  221\n",
       "\n",
       "[4633 rows x 5 columns]"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RFM['RFM'] = RFM['R'].map(str) + RFM['F'].map(str) + RFM['M'].map(str)\n",
    "RFM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Проработав в школе достаточно большое количество времени, могу сказать, что для подбора кластеров стоит разделять студентов на более подробные категории:\n",
    "\n",
    "- Отличник\n",
    "- Кандидат в отличники\n",
    "- Хорошист\n",
    "- Не хватает мотивации\n",
    "- Троечник\n",
    "- Двоечник\n",
    "- Отчислен / Иная причина\n",
    "\n",
    "Между каждой основной категорией присутствует уточнение, которое поможет лучше понять обучающихся.\n",
    "\n",
    "Для распределения кластеров необходимо посмотреть, какое количество студентов в каком промежутке от максимального до минимального находится в уникальном RFM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RFM</th>\n",
       "      <th>id_student</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>424</td>\n",
       "      <td>244</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>423</td>\n",
       "      <td>357</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>422</td>\n",
       "      <td>350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>421</td>\n",
       "      <td>163</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>412</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>411</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>324</td>\n",
       "      <td>178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>323</td>\n",
       "      <td>305</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>322</td>\n",
       "      <td>247</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>321</td>\n",
       "      <td>109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>312</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>311</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>224</td>\n",
       "      <td>309</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>223</td>\n",
       "      <td>349</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>222</td>\n",
       "      <td>308</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>221</td>\n",
       "      <td>193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>212</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>211</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>124</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>123</td>\n",
       "      <td>36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>122</td>\n",
       "      <td>54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>121</td>\n",
       "      <td>37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>112</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>111</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>024</td>\n",
       "      <td>301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>023</td>\n",
       "      <td>195</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>022</td>\n",
       "      <td>204</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>021</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>001</td>\n",
       "      <td>533</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>000</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    RFM  id_student\n",
       "29  424         244\n",
       "28  423         357\n",
       "27  422         350\n",
       "26  421         163\n",
       "25  412           1\n",
       "24  411           2\n",
       "23  324         178\n",
       "22  323         305\n",
       "21  322         247\n",
       "20  321         109\n",
       "19  312           2\n",
       "18  311          10\n",
       "17  224         309\n",
       "16  223         349\n",
       "15  222         308\n",
       "14  221         193\n",
       "13  212           4\n",
       "12  211           7\n",
       "11  124           8\n",
       "10  123          36\n",
       "9   122          54\n",
       "8   121          37\n",
       "7   112           1\n",
       "6   111           1\n",
       "5   024         301\n",
       "4   023         195\n",
       "3   022         204\n",
       "2   021         122\n",
       "1   001         533\n",
       "0   000           3"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RFM_count = RFM.groupby('RFM', as_index = False) \\\n",
    "                .agg({'id_student':'count'}) \\\n",
    "                .sort_values('RFM', ascending = False)\n",
    "RFM_count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 7. Списки для кластеров**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Распределяем каждый кластер по полученным значениям:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "mark5 = ['424', '024', '324', '224', '124'] # отличник\n",
    "almost_mark_5 = ['323', '423', '123'] # кандидат в отличники\n",
    "mark4 = ['223', '422', '023','222', '322'] # хорошист\n",
    "motivation = ['122', '412', '212', '312', '022'] # не хватает мотивации\n",
    "mark3 = ['321', '421', '112', '411'] # троечник\n",
    "control = ['111', '211', '121', '221', '311'] # требуется постоянный контроль\n",
    "mark2 = ['000', '001', '021'] # отчисление / иные причины"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Создаем колонку Groups, куда будем записывать кластер, к которому принадлежит ученик, проверяем значения RFM:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "RFM['Groups'] = RFM['RFM'].apply(lambda x: 'отличник' if x in mark5\\\n",
    "                                         else 'кандидат в отличники' if x in almost_mark_5\\\n",
    "                                         else 'хорошист' if x in mark4\\\n",
    "                                         else 'не хватает мотивации' if x in motivation\\\n",
    "                                         else 'троечник' if x in mark3\\\n",
    "                                         else 'требуется постоянный контроль' if x in control\\\n",
    "                                         else 'отчисление / иные причины')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Получаем итоговый датасет с разбитием студентов по кластерам:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_student</th>\n",
       "      <th>R</th>\n",
       "      <th>F</th>\n",
       "      <th>M</th>\n",
       "      <th>RFM</th>\n",
       "      <th>Groups</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>23698</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>423</td>\n",
       "      <td>кандидат в отличники</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24213</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>322</td>\n",
       "      <td>хорошист</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27116</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>424</td>\n",
       "      <td>отличник</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>28046</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>321</td>\n",
       "      <td>троечник</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28787</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>421</td>\n",
       "      <td>троечник</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4628</th>\n",
       "      <td>2694886</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>323</td>\n",
       "      <td>кандидат в отличники</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4629</th>\n",
       "      <td>2694933</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>223</td>\n",
       "      <td>хорошист</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4630</th>\n",
       "      <td>2695608</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>323</td>\n",
       "      <td>кандидат в отличники</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4631</th>\n",
       "      <td>2697181</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>223</td>\n",
       "      <td>хорошист</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4632</th>\n",
       "      <td>2698251</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>221</td>\n",
       "      <td>требуется постоянный контроль</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4633 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      id_student  R  F  M  RFM                         Groups\n",
       "0          23698  4  2  3  423           кандидат в отличники\n",
       "1          24213  3  2  2  322                       хорошист\n",
       "2          27116  4  2  4  424                       отличник\n",
       "3          28046  3  2  1  321                       троечник\n",
       "4          28787  4  2  1  421                       троечник\n",
       "...          ... .. .. ..  ...                            ...\n",
       "4628     2694886  3  2  3  323           кандидат в отличники\n",
       "4629     2694933  2  2  3  223                       хорошист\n",
       "4630     2695608  3  2  3  323           кандидат в отличники\n",
       "4631     2697181  2  2  3  223                       хорошист\n",
       "4632     2698251  2  2  1  221  требуется постоянный контроль\n",
       "\n",
       "[4633 rows x 6 columns]"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RFM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Этап 8. Построим визуализацию для наглядного распределения студентов по группам**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Groups</th>\n",
       "      <th>id_student</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>хорошист</td>\n",
       "      <td>1449</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>отличник</td>\n",
       "      <td>1040</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>кандидат в отличники</td>\n",
       "      <td>698</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>отчисление / иные причины</td>\n",
       "      <td>658</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>троечник</td>\n",
       "      <td>275</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>не хватает мотивации</td>\n",
       "      <td>265</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>требуется постоянный контроль</td>\n",
       "      <td>248</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                          Groups  id_student\n",
       "6                       хорошист        1449\n",
       "2                       отличник        1040\n",
       "0           кандидат в отличники         698\n",
       "3      отчисление / иные причины         658\n",
       "5                       троечник         275\n",
       "1           не хватает мотивации         265\n",
       "4  требуется постоянный контроль         248"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RFM_1 = RFM.groupby('Groups', as_index=False)\\\n",
    "            .agg({'id_student':'count'})\\\n",
    "            .sort_values('id_student', ascending=False)\n",
    "RFM_1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAABJ8AAAJNCAYAAACfq9RvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOzdfbgkZ1kn4N9DhgCCECAjG5OwyUJEoiCEIYYPFQgbA6JBRIFFCRidVQFRFMWPi3ihuCK48RuMEAhrFhQEiS6KMXyjQCYhJgFUxhDIZBMYCcYPYCH47B9dA52Tc2bOGc57eubkvq/rXF311ltVT3dXV3f/TlV1dXcAAAAAYIRbLboAAAAAADYv4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwzJZFFzDK4Ycf3sccc8yiywAAAADYNC6++OJ/6u6ta5ln04ZPxxxzTHbs2LHoMgAAAAA2jar66FrncdodAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgmC2LLuBA9YDnvGrRJbBAF7/oKYsuAQAAADYFRz4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDDA2fquqcqvpEVV2xzLSfqKquqsOn8aqq36yqnVV1WVWdMNf39Kr68PR3+siaAQAAAFg/o498emWSU5c2VtXRSU5J8rG55kclOW76257kJVPfuyQ5M8k3JjkxyZlVdeehVQMAAACwLoaGT939jiTXLzPprCQ/laTn2k5L8qqeeU+Sw6rqiCTfmuSC7r6+uz+V5IIsE2gBAAAAcODZ8Gs+VdVpSa7p7r9dMunIJFfPje+a2lZqBwAAAOAAt2UjV1ZVX5HkZzM75W7E8rdndspe7n73u49YBQAAAABrsNFHPt0jybFJ/raqrkpyVJJLquo/JbkmydFzfY+a2lZqv5nuPru7t3X3tq1btw4oHwAAAIC12NDwqbsv7+6v6u5juvuYzE6hO6G7r0tyfpKnTL96d1KSG7r72iRvTnJKVd15utD4KVMbAAAAAAe4oeFTVb06yd8kuVdV7aqqM/bS/U1JrkyyM8nvJ/mRJOnu65P8YpKLpr/nT20AAAAAHOCGXvOpu5+0j+nHzA13kqev0O+cJOesa3EAAAAADLfhv3YHAAAAwC2H8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhmaPhUVedU1Seq6oq5thdV1d9V1WVV9YaqOmxu2s9U1c6q+vuq+ta59lOntp1V9dyRNQMAAACwfkYf+fTKJKcuabsgydd3932T/EOSn0mSqjo+yROTfN00z+9W1SFVdUiS30nyqCTHJ3nS1BcAAACAA9zQ8Km735Hk+iVtf9ndN06j70ly1DR8WpLXdPf/6+6PJNmZ5MTpb2d3X9ndn0vymqkvAAAAAAe4RV/z6fuT/Pk0fGSSq+em7ZraVmoHAAAA4AC3sPCpqn4uyY1JzlvHZW6vqh1VtWP37t3rtVgAAAAA9tNCwqeqemqSxyR5cnf31HxNkqPnuh01ta3UfjPdfXZ3b+vubVu3bl33ugEAAABYmw0Pn6rq1CQ/leQ7uvvTc5POT/LEqrpNVR2b5Lgk70tyUZLjqurYqjo0s4uSn7/RdQMAAACwdltGLryqXp3kYUkOr6pdSc7M7NftbpPkgqpKkvd09w919weq6o+SfDCz0/Ge3t1fmJbzjCRvTnJIknO6+wMj6wYAAABgfQwNn7r7Scs0v3wv/V+Q5AXLtL8pyZvWsTQAAAAANsCif+0OAAAAgE1M+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwzJZFFwDc3Meef59Fl8CC3P15ly+6BAAAgHXlyCcAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMMzQ8KmqzqmqT1TVFXNtd6mqC6rqw9Ptnaf2qqrfrKqdVXVZVZ0wN8/pU/8PV9XpI2sGAAAAYP2MPvLplUlOXdL23CQXdvdxSS6cxpPkUUmOm/62J3lJMgurkpyZ5BuTnJjkzD2BFQAAAAAHtqHhU3e/I8n1S5pPS3LuNHxuksfOtb+qZ96T5LCqOiLJtya5oLuv7+5PJbkgNw+0AAAAADgALeKaT3fr7mun4euS3G0aPjLJ1XP9dk1tK7UDAAAAcIBb6AXHu7uT9Hotr6q2V9WOqtqxe/fu9VosAAAAAPtpEeHTx6fT6TLdfmJqvybJ0XP9jpraVmq/me4+u7u3dfe2rVu3rnvhAAAAAKzNIsKn85Ps+cW605O8ca79KdOv3p2U5Ibp9Lw3Jzmlqu48XWj8lKkNAAAAgAPclpELr6pXJ3lYksOraldmv1r3K0n+qKrOSPLRJN8zdX9Tkkcn2Znk00meliTdfX1V/WKSi6Z+z+/upRcxBwAAAOAANDR86u4nrTDp5GX6dpKnr7Ccc5Kcs46lAQAAALABFnrBcQAAAAA2N+ETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhVh0+VdVDVtMGAAAAAHus5cin31plGwAAAAAkSbbsq0NVPSjJg5Nsrapnz026Y5JDRhUGAAAAwMFvn+FTkkOT3GHq+5Vz7f+S5PEjigIAAABgc9hn+NTdb0/y9qp6ZXd/dANqAgAAAGCTWM2RT3vcpqrOTnLM/Hzd/Yj1LgoAAACAzWEt4dNrk7w0ycuSfGFMOQAAAABsJmsJn27s7pes14qr6seT/ECSTnJ5kqclOSLJa5LcNcnFSb6vuz9XVbdJ8qokD0jyySRP6O6r1qsWAAAAAMa41Rr6/mlV/UhVHVFVd9nztz8rraojk/xokm3d/fWZ/WreE5O8MMlZ3X3PJJ9KcsY0yxlJPjW1nzX1AwAAAOAAt5bw6fQkz0ny15kdlXRxkh1fxrq3JLldVW1J8hVJrk3yiCSvm6afm+Sx0/Bp03im6SdXVX0Z6wYAAABgA6z6tLvuPna9Vtrd11TVi5N8LMlnkvxlZmHWP3f3jVO3XUmOnIaPTHL1NO+NVXVDZqfm/dN61QQAAADA+lv1kU9V9RVV9fPTL96lqo6rqsfsz0qr6s6ZHc10bJKvTnL7JKfuz7KWLHd7Ve2oqh27d+/+chcHAAAAwJdpLafdvSLJ55I8eBq/Jskv7ed6H5nkI929u7s/n+T1SR6S5LDpNLwkOWpax551HZ0k0/Q7ZXbh8Zvo7rO7e1t3b9u6det+lgYAAADAellL+HSP7v7VJJ9Pku7+dJL9ve7Sx5KcNB1NVUlOTvLBJG9N8vipz+lJ3jgNnz+NZ5r+lu7u/Vw3AAAAABtkLeHT56rqdkk6SarqHkn+3/6stLvfm9mFwy9JcvlUx9lJfjrJs6tqZ2bXdHr5NMvLk9x1an92kufuz3oBAAAA2FirvuB4kjOT/EWSo6vqvMxOk3vq/q64u8+cljnvyiQnLtP3s0m+e3/XBQAAAMBirOXX7i6oqkuSnJTZ6XbP6m6/NgcAAADAivYZPlXVCUuarp1u715Vd+/uS9a/LAAAAAA2g9Uc+fRr0+1tk2xL8reZHfl03yQ7kjxoTGkAAAAAHOz2ecHx7n54dz88syOeTujubd39gCT3T3LN6AIBAAAAOHit5dfu7tXdl+8Z6e4rktx7/UsCAAAAYLNYy6/dXVZVL0vyB9P4k5Nctv4lAQAAALBZrCV8elqSH07yrGn8HUlesu4VAQAAALBprDp86u7PJjlr+gMAAACAfVp1+FRVH0nSS9u7+7+sa0UAAAAAbBprOe1u29zwbZN8d5K7rG85AAAAAGwmq/61u+7+5NzfNd3960m+bWBtAAAAABzk1nLa3Qlzo7fK7EiotRw5BQAAAMAtzFrCo1+bG74xyUeSfM/6lgMAAADAZrKW8OmM7r5yvqGqjl3negAAAADYRFZ9zackr1tlGwAAAAAkWcWRT1X1tUm+Lsmdqupxc5PumNmv3gEAAADAslZz2t29kjwmyWFJvn2u/V+T/OCIogAAAADYHPYZPnX3G5O8saoe1N1/swE1AQAAALBJrOWaT99ZVXesqltX1YVVtbuqvndYZQAAAAAc9NYSPp3S3f+S2Sl4VyW5Z5LnjCgKAAAAgM1hLeHTrafbb0vy2u6+YUA9AAAAAGwiq7ng+B5/WlV/l+QzSX64qrYm+eyYsgAAAADYDFZ95FN3PzfJg5Ns6+7PJ/l0ktP2TK+q/7r+5QEAAABwMFvLaXfp7uu7+wvT8L9393Vzk1+4rpUBAAAAcNBbU/i0D7WOywIAAABgE1jP8KnXcVkAAAAAbALrGT4BAAAAwE2sZ/h01TouCwAAAIBNYMu+OlTV4/Y2vbtfP93utR8AAAAAtzz7DJ+SfPt0+1VJHpzkLdP4w5P8dZLXD6gLAAAAgE1gn+FTdz8tSarqL5Mc393XTuNHJHnl0OoAAAAAOKit5sinPY7eEzxNPp7k7utcDwAL9JDfesiiS2CB3v3Mdy+6BAAANqG1hE8XVtWbk7x6Gn9Ckr9a/5IAAAAA2CxWHT519zOmi49/09R0dne/YUxZAAAAAGwGaznyac8v27nAOAAAAACrss/wqare1d0Prap/TdLzk5J0d99xWHUAAAAAHNRW82t3D51uv3J8OQAAAABsJrdadAEAAAAAbF7CJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMFsWXQAAQJK8/Zu/ZdElsCDf8o63L7oEAGAgRz4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwCwufquqwqnpdVf1dVX2oqh5UVXepqguq6sPT7Z2nvlVVv1lVO6vqsqo6YVF1AwAAALB6izzy6TeS/EV3f22Sb0jyoSTPTXJhdx+X5MJpPEkeleS46W97kpdsfLkAAAAArNVCwqequlOSb07y8iTp7s919z8nOS3JuVO3c5M8dho+LcmreuY9SQ6rqiM2uGwAAAAA1mhRRz4dm2R3kldU1fur6mVVdfskd+vua6c+1yW52zR8ZJKr5+bfNbUBAAAAcABbVPi0JckJSV7S3fdP8u/50il2SZLu7iS9loVW1faq2lFVO3bv3r1uxQIAAACwfxYVPu1Ksqu73zuNvy6zMOrje06nm24/MU2/JsnRc/MfNbXdRHef3d3bunvb1q1bhxUPAAAAwOosJHzq7uuSXF1V95qaTk7ywSTnJzl9ajs9yRun4fOTPGX61buTktwwd3oeAAAAAAeoLQtc9zOTnFdVhya5MsnTMgvD/qiqzkjy0STfM/V9U5JHJ9mZ5NNTXwAAAAAOcAsLn7r70iTblpl08jJ9O8nThxcFAAAAwLpa1DWfAAAAALgFED4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGGah4VNVHVJV76+qP5vGj62q91bVzqr6w6o6dGq/zTS+c5p+zCLrBgAAAGB1Fn3k07OSfGhu/IVJzurueyb5VJIzpvYzknxqaj9r6gcAAADAAW5h4VNVHZXk25K8bBqvJI9I8rqpy7lJHjsNnzaNZ5p+8tQfAAAAgAPYIo98+vUkP5XkP6bxuyb55+6+cRrfleTIafjIJFcnyTT9hqk/AAAAAAewhYRPVfWYJJ/o7ovXebnbq2pHVe3YvXv3ei4aAAAAgP2wqCOfHpLkO6rqqiSvyex0u99IclhVbZn6HJXkmmn4miRHJ8k0/U5JPrl0od19dndv6+5tW7duHXsPAAAAANinhYRP3f0z3X1Udx+T5IlJ3tLdT07y1iSPn7qdnuSN0/D503im6W/p7t7AkgEAAADYD4v+tbulfjrJs6tqZ2bXdHr51P7yJHed2p+d5LkLqg8AAACANdiy7y5jdffbkrxtGr4yyYnL9Plsku/e0MIAAAAA+LIdaEc+AQAAALCJCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGCYLYsuAAAAFum3f+JPF10CC/SMX/v2RZcAsOk58gkAAACAYYRPAAAAAAzjtDsAAIAFeMH3Pn7RJbBAP/cHr1t0CbBhHPkEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADOPX7gAAAOAW5kMveMuiS2BB7v1zj9jwdTryCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDLCR8qqqjq+qtVfXBqvpAVT1rar9LVV1QVR+ebu88tVdV/WZV7ayqy6rqhEXUDQAAAMDaLOrIpxuT/ER3H5/kpCRPr6rjkzw3yYXdfVySC6fxJHlUkuOmv+1JXrLxJQMAAACwVgsJn7r72u6+ZBr+1yQfSnJkktOSnDt1OzfJY6fh05K8qmfek+Swqjpig8sGAAAAYI0Wfs2nqjomyf2TvDfJ3br72mnSdUnuNg0fmeTqudl2TW0AAAAAHMAWGj5V1R2S/HGSH+vuf5mf1t2dpNe4vO1VtaOqduzevXsdKwUAAABgfywsfKqqW2cWPJ3X3a+fmj++53S66fYTU/s1SY6em/2oqe0muvvs7t7W3du2bt06rngAAAAAVmVRv3ZXSV6e5EPd/T/nJp2f5PRp+PQkb5xrf8r0q3cnJblh7vQ8AAAAAA5QWxa03ock+b4kl1fVpVPbzyb5lSR/VFVnJPloku+Zpr0pyaOT7Ezy6SRP29hyAQAAANgfCwmfuvtdSWqFyScv07+TPH1oUQAAAACsu4X/2h0AAAAAm5fwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwwicAAAAAhhE+AQAAADCM8AkAAACAYYRPAAAAAAwjfAIAAABgGOETAAAAAMMInwAAAAAYRvgEAAAAwDDCJwAAAACGET4BAAAAMIzwCQAAAIBhhE8AAAAADCN8AgAAAGAY4RMAAAAAwwifAAAAABhG+AQAAADAMMInAAAAAIYRPgEAAAAwjPAJAAAAgGGETwAAAAAMI3wCAAAAYBjhEwAAAADDCJ8AAAAAGEb4BAAAAMAwB1X4VFWnVtXfV9XOqnruousBAAAAYO8OmvCpqg5J8jtJHpXk+CRPqqrjF1sVAAAAAHtz0IRPSU5MsrO7r+zuzyV5TZLTFlwTAAAAAHtxMIVPRya5em5819QGAAAAwAGqunvRNaxKVT0+yand/QPT+Pcl+cbufsZcn+1Jtk+j90ry9xte6OZxeJJ/WnQR3CLZ9lgk2x+LYttjkWx/LIptj0Wy/e2//9zdW9cyw5ZRlQxwTZKj58aPmtq+qLvPTnL2Rha1WVXVju7etug6uOWx7bFItj8WxbbHItn+WBTbHotk+9tYB9NpdxclOa6qjq2qQ5M8Mcn5C64JAAAAgL04aI586u4bq+oZSd6c5JAk53T3BxZcFgAAAAB7cdCET0nS3W9K8qZF13EL4fRFFsW2xyLZ/lgU2x6LZPtjUWx7LJLtbwMdNBccBwAAAODgczBd8wkAAACAg4zwCQBgA1TVravqkkXXAcAtV1Xdrqr+R1W9p6ourapHL7qmW5KqelJVvbeq3lVVX7foejaS0+4AADZAVT08yeO6+5mLrgWAW6aqelWSdyV5RXd/ftH1cMvhyKeDUFU9sKouq6rbVtXtq+oDVXWfqnpRVV1RVZdX1ROmvg+rqndU1f+pqr+vqpdW1a2maU+a+l5RVS+cW/6/zQ1fUVXHTMN/UFWPmYavqqrD59qvmIYPqaoXT/NdVlXPrKonTKn6zqq6YRp24fiDWFU9e3qOr6iqH5u2vUur6rqqumYafv7U92Fzz/t1VfWTU/tK29AvzPU5Zq79qVX123M1/HZVPXUfy/riPFX1xKp6c1XdeoMepk1hyXNw76r626o6evqnOoQAABJiSURBVBr/k6q6eNoHbZ+bZ34fsq2q3jY3/pPTdnBpVV1fVY+f2l85N/wDVdVzz+nhVfW5uf3In03td6iqC6vqkmlfdtrUvuL2uOR+fWaadmVVvXiF+3+TbX1fyz8Qt/cl7feqqhv3PNYbaS2P5TL3/9+m24fNPf93qap/nnv87llVfzVto5dU1T3m5lnuOXlAVb192obfXFVHTO1vq6pta1z3ParqL6ZlvbOqvnaFh+HUJH++zGMz/5p559x6vrh9TON/VlUPWzrP3PT59+zvrar3Tff796rqkGX6XzW9di6dbq+a2p9aVW+cHosPV9WZU/v8Nnrr6bWzZ9v64uO2ZJtb9rU9v6xp2uOr6pVL55mb/pNV9QsrPK6b2gqvk3fU2j/fnVJVfzO9Pl5bVXeY2ue3v/nna35fc/iS7WPZfcqeeWq2f353VZ2yQQ/TprKP18fWqvrjqrpo+nvIMvP/eFWdMw3fZ9oWvmLap1xTs8/of1dVj5j6fHvNjsR4f832o3er2dExl05/n5vbV2xbrv+0nNtX1TnTvuf99aX35bdO8/7btL1eWlXfsaTmp077h6+dxu89jT91Gj95Wubl0zpuU3v5jlF736+u6jGY67/Sfnif+7dasq/cXyvsB55fe/+ud7PX/PS6f1iS709ySVW9oaruXLP3sUvm1nfcnvFa5v2yqr5pquGD9aXPU5dO/R9YVX9ds/fj91XVV1bVodO69nxXvWqZ+zj/fn3pdD9/YZp2v5odqXXZnpqn9pu991fVefWlz5kfmYZ/qGbfnV8xrf/9NfuHUFaqrVb/WWzZ/ePcfF/8LFw3/dz3yGmb2ZYlaoV9aS3zOthHPcu+9qY6/te0fXy4qn5w6l+1TKawzPOz7GfcfRE+HYS6+6Ik5yf5pSS/muQPknxNkvsl+YYkj0zyopo+SCc5Mckzkxyf5B5JHldVX53khUkeMc33wKp67Fprqar7JPn6uabtSY5Jcr/uvm+S87r7D7v7fkl+IMk7u/t+3e3wzoNUVT0gydOSfGOSk5L8YJL/PT3HL01y1vQcP2+a5ZAkb5+bvnR5S7eh/0hS+1nb0mXtaX9kkmcl+S7/4dk/VXVkklcn+W/dffXU/P3d/YAk25L8aFXddRWLOiTJ707bw/nLrOe2SX4oySeWzLNrbj+yx2eTfGd3n5Dk4Ul+raqqu5+zl+1x3j9O/R6U5KnL1HKzbb2q7r+P5R/o2/svJvnQ/qzvy7Efj+VqHpefSfKxufHzkvxOd39DkgcnuXZqv9lzUrMQ+reSPH7ahs9J8oI13KWl6z47yTOnZf1kkt9dYb6HJ3nbSgutqm9Lcqc11LHScu6d5AlJHjLd7y8kefJKNU19Hr6k/cQk35Xkvkm+e5kPx9uTzAdge33OVnhtswrLvU6SPC9r+Hw3fTH4+SSPnPaZO5I8ex3KW26fcuskr81sX/+X67AObuo3MtsOHpjZa/RlK/S5Z1V9Z5JXJPnv3f3padpZ02f0lyd5zNT2riQndff9k7wmyU9192emffL9kvzfTPuK7t6xXP9pOT+X5C3dfWJm+5QXVdXtu3vPfmZHkidPy7nZZ4Ak78ssFMl0+97ki/uPVyZ5QnffJ7NfbP/h1XzHWGG/uqrHYJn6lrWP/dvSfeV+2cdnj+X2BSu95u+a5OgkPz09lpcnObO7/zHJDVV1v2mZT0vyipXeL7v7nVM9j870eaq771dVhyb5wyTPmt6PH5nkM0m+Ncmtu/vrc/P3m3nvnNvuzpprf9VU83331Dy13+y9v7ufPPc58znT8l6a5Omzh7Lvk+RJSc6dnrvV1rbenpdk516m32RfutLrYG8r2Mdr776ZvVc8KMnzpvePx2XlTGGvn3FXY8v+zMQB4flJLsrsy9ePJnlxkld39xeSfLyq3p7kgUn+Jcn7uvvKJKmqVyd5aJLPJ3lbd++e2s9L8s1J/mSNdfxSZi/+PR/aH5nkpd19Y5J09/X7fQ85UD00yRu6+9+TpKpen+Sbkrx/hf63y2w7XcnSbWhXZtvRcp5QVQ+dho/MbEe6t2UlyX2SPCXJ6d39Zb/530LdIclfZPaB8gNz7T86fbBNZh9kjkvyySS3q+m/X5k9/9fOzXOHJB/fy7qenuTcJD+xZJ7l9iWV5Jer6psz++J7ZJK7JbluNXcqyT2mOo/NbB+61Fq39eQA3t6n8OBWSS7eS32jrPWx3JVke1Xdqrv/Y+nEKQw9KckbpvGvTHJkd78hSbp7/jlY7jm5V2bB3QVVlcw+UM1vp+dV1Wfm5t/buu+Q2Qfe107LSpLbrFDz9XNfAJdOr8y+uP1yku+dm/TjVbVnfH5b3fM6qyRvT/Jjc/OcnOQBSS6aarpd1h76XNDdn5xqe31mz+GfTOO3z+yLye/mSwHoriT3z+yzyXKWe23fY25fcafpfuzxoqr6+cz2KXv9cH0LtpbPdzdm9sX03dM2cWiSv5mWc31VfU13/8My63hrVX0hs9fITexln/L7SY7o7vO+zPt3S7fS6+ORSY6f29/csaruML/P7+7/qNkRGpcl+b3ufvfccn+8qr4/yVflS1+0j0ryh9OXzEOTfGQfta3U/5Qk31FfOlLotknuntX/0+OiJPefvmTv+cL8/9u795i5ijqM49+nYNJW4cXWqlzEgiBXC1IEixAuaiIJEYog0FLBNEoEvGDQGBJjsVFQwRKLCgmgtRRSKQikIaVNL1oupZVLX2hpNYIELxEQaBXKTX7+MbPt6b7n7OV9d/u+heeTNNmePTs757xz5syZ+c0spDr7yUIZnUWqU65qlFiTenWg56CorH6rqiu7oawueIXya17A0xFRK0+zSB0ckDoyvyjpm6QBjCNofr+stx+pA2gVQERszPn6HzBSJVG4zUjqAXapz3OTe3+Zo0kdaUTEOklPkQI4GuWtUVusqn6sfeZ14FJK2qWSPkcq7+Mb5Le+Lm12HVTW1xXuiIhNwCZJS0l/76Mp71O4k8Zt3KWS3iR1DH4pp9uHI5+2X6NJD2Q7kSr2RuoX9urUQl9HkXryV3coPXtr2o00YlamrAzNJT1UPQbUT8+cWxgNmdtCWgAHAJOAS3Njxtr3AVLD7fgcUYFSyPmngAl5tOlhttRFxdHS+miLvUgPqWV2Bs4Erm3xM5OBMcD4/F3/onl9WFSLfNoVOEt5OuEADeXyPh34bktHMfiWAeuBRwsPX0XfIx1PK/ezsr+JgDW18xsRH4mI4vSgyYVzX9+Aqv/uYcCLhbQOjYgDSvLxGeDuBvk8i3Tc9Y3UGYW8LC9s35S3jSeNXhY7MQXMKuRnv4iY1uC7yzRqO3ydFO1VbIT+EJgqaTVpgKyo6tr+S+HYvlX3Xm2U/2ag3by/XbTTvhOpQ7FWJg6MiKn5vW8At+Vr7bN1n6uKjIPqOuXPwOr8cG/9V3V9DCNF6NT+lrtXDK7tS7pP7Fa3fUZEHEi6Jq/M22YCV+doivNofi+t2l+kqNta3vaMiHajbRfk9PtMUe6HRvXqQM9BTVX9BuV1ZTeU1QVV1/zGBuncCpxIigZ7MA9ANLtftmoh8ATwLLC0H5/vpkZ5a9QWq6ofaxF5kygvFzuQrunLmuSr3bq0UX1dpt0+gkZt3ONJHcYBTKlKwJ1P269rSTf8OaTw6uWkXtYdJI0hjXKtzPseIWkvpfm/Z5DCSlcCxyrNCd2BVDn/vv5LmphGChcsWgScJ2lHAEmj2j4yG+qWA6corR3wTmAiWz8QbZbL1qnAvWXvU1KGIuKliJiYQ1/bmZ7ZJ63stxExH5hX8b4193hE3EwK6b42jyT2AC9ExMtK6zN8vFkiknYhjagsrtjlImBmRLxWt/10YH7J/j3AMxHxutK8/Q+2djh9vEoa9Xp33faWyzoM+fJ+LGkkcptPucvaOpcR8WZETI2Ig3JDquhDwNgoTOeJiP8Af1OePq60FsjIBn+T9cAYSRPy/u9Qa784U/bdG4EnJZ2e05KkQ0o+W7reUzaM1AHw4xbysJUcabyBNKpdsxg4TdJ7c55GSWr3+vh0/twI4BS2nMOe/P8b6vKxLiKOzJ3R9WWz6tpuxb/Z+thsi3badyuAT0jaBzavzfNhgIi4LSIOjorp0BUa1Sk/IE3v+bbyujnWUQtJ92MgrYVTv0OOFPkZ6XlgtMrX+dsI1NZp6QH+nl+f00Ieqva/G/hqbicg6aMtpFVvNmmA5cbCtvXA2Fr5JT3cNntuaaVeHcg5qKmq30rryi4pqwtKr/lIs1JelXRM/uzmc5kjh+4Gfkmargnt3y/XA7tK+ljefydJO+Z71SZSp0tbU9siYgPwQn2eq+79DZJaTh4UzfXfnsD6geStiecpn2l2NnBXRDzX5PP1dWl/roNGTlZaB2s0aR2wVVT0KbTQxiUignTMlfdsdz5thyR9AXg9Im4CLieFwm0ghdauBpaQ5mrXevlXAVeTQl6fJE19+CfwHVLv7mpS7/Ydef8RSj/9eA8p4uCW/Lq+l/uBSPODi64jrYPRm0c/J3XquG1oiIiHSPONV5Lm4l8XEVVTZ2aTeu1vrXi/rAz1V7O0LgNOlDSuQ9/3tpPDndeRpsAsAHaU9DipHlrRQhILSSHuywsj7D8pvC+2bmwi6XzSegkX589cBxyjtFDpHOBwSY+Sppqta/OQalMaHgOWRkRv3fG2U9ZhaJf3fRnE6JF+nMtG9qe8420KaSpoL3Af8H4q/ib5IeE04Ef5XvUI6WGnv989mS1RP2uAk4tv5kbbPhFRVUZHALdGxIst5GHzZ/K9+gHSqPrmqKqIWEta62NhPh+LSBF+7VhJOm+9OW+1qQZ7AFfmxnqr+lzbLZie2x6XkKKqrK+W23eRpuGdC9ycy8T9pPLcXw3rlBwx8X3yFBfrqK+R7n29ktaS1hqqN4O0Ds6fgKnA5bXOaNKUs0dInSKX5G3TSO39B4FmD8SN9p9OWqemV9Ka/P+2RMQzeeDhmcK2V0jT127J9/w3ab7mTKN6td1zcEHh2egoti7XVfVbf+rK/iqrCxpd81OAq/L2cWwdrTqHdH4XQvv3y7z/GcDMvP8iYLikzwM7RcT1/TzGc0jTsXtJETa1PJfd+6v8AhiWy9Bc4NyIeLUDeat3ai4ri+kb1QtpiYiftpJQsS7t53XQSC/pXrECmB4R/yAtKVDWp9CsjTtf0n2kCPzZVV+o1EFlb1VKU2MujoiTmu1rZtZNkpZFxHF12+ZFROUvryn9ysmyiFhW2HYS8J6I+HV3cmrWWUprP5wdEWUPiUOO0loxh0fEhYOdFyvn9p3Z4ClrzwyWTtcFSut19UTE9jJN3/oht6//GxGlv/bcLV5w3MzMtpX6dWBg618yKTOPvgslP0TJgs5mQ1VE3EOaBmFmZtu/TkXIDCmSfkeaXn7CYOfF3poc+WRmZmZmZmZmZl3jNZ/MzMzMzMzMzKxr3PlkZmZmZmZmZmZd484nMzMzMzMzMzPrGnc+mZmZmQ2QpPdJuknSE5IelHS/pImDnS8zMzOzocCdT2ZmZmYDIEnA7cAfImLviBgPnAnsUbeff2XYzMzM3pbc+WRmZmY2MCcAr0XENbUNEfFURMyUdK6kOyUtARZLGiXpdkm9klZIGgcgaZqki2ufl/SYpLH53zpJcyQ9LmmepJF5n8slrc1pXbGtD9rMzMysVR6BMzMzMxuYg4CHGrx/GDAuIp6XNBN4OCJOkXQC8Bvg0Cbp7wdMjYh7Jd0AnC/pV8BEYP+ICEm7dOA4zMzMzLrCkU9mZmZmHSTp55JWS1qVNy2KiOfz66OB2QARsQQYLWnnJkk+HRH35tc35jQ2AK8A10s6FXi5owdhZmZm1kHufDIzMzMbmDWk6CYAIuIC4JPAmLzppRbSeIOt22XDC6+jbt+IiDeAI4B5wEnAgjbzbGZmZrbNuPPJzMzMbGCWAMMlfaWwbWTFvsuByQCSjgOei4iNwF/JHViSDgP2KnxmT0kT8utJwD2S3gX0RMRdwEXAIZ05FDMzM7POU0T9YJqZmZmZtUPSrsAM4EjgWVK00zXACODwiLgw7zcKuAHYmzRV7ssR0StpBHAHsDvwADABODEnvwD4IzAeWAtMAXry/sMBAVdExKzuH6mZmZlZ+9z5ZGZmZjZESRoLzI+Igwc5K2ZmZmb95ml3ZmZmZmZmZmbWNY58MjMzMzMzMzOzrnHkk5mZmZmZmZmZdY07n8zMzMzMzMzMrGvc+WRmZmZmZmZmZl3jziczMzMzMzMzM+sadz6ZmZmZmZmZmVnXuPPJzMzMzMzMzMy65v82y0Ga0yoIRQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 1440x720 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(figsize=(20, 10))\n",
    "b_plot_rfm = sns.barplot(x=RFM_1.Groups, y=RFM_1.id_student)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Вывод"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "По итогам выполненного задания, можно увидеть, что учащиеся осваивают образовательную програму успешно. Наличие неуспевающих групп говорит о том, что необходимо постоянно контролировать студентов в их учебной деятельности, что поможет поднять общую успеваемость."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
